35 research outputs found

    Incremental Clustering: The Case for Extra Clusters

    Full text link
    The explosion in the amount of data available for analysis often necessitates a transition from batch to incremental clustering methods, which process one element at a time and typically store only a small subset of the data. In this paper, we initiate the formal analysis of incremental clustering methods focusing on the types of cluster structure that they are able to detect. We find that the incremental setting is strictly weaker than the batch model, proving that a fundamental class of cluster structures that can readily be detected in the batch setting is impossible to identify using any incremental method. Furthermore, we show how the limitations of incremental clustering can be overcome by allowing additional clusters

    Uncovering Group Level Insights with Accordant Clustering

    Full text link
    Clustering is a widely-used data mining tool, which aims to discover partitions of similar items in data. We introduce a new clustering paradigm, \emph{accordant clustering}, which enables the discovery of (predefined) group level insights. Unlike previous clustering paradigms that aim to understand relationships amongst the individual members, the goal of accordant clustering is to uncover insights at the group level through the analysis of their members. Group level insight can often support a call to action that cannot be informed through previous clustering techniques. We propose the first accordant clustering algorithm, and prove that it finds near-optimal solutions when data possesses inherent cluster structure. The insights revealed by accordant clusterings enabled experts in the field of medicine to isolate successful treatments for a neurodegenerative disease, and those in finance to discover patterns of unnecessary spending.Comment: accepted to SDM 2017 (oral

    A Theoretical Study of Clusterability and Clustering Quality

    Get PDF
    Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building theoretical foundations for clustering by carrying out an abstract analysis of two central concepts in clustering; clusterability and clustering quality. We compare a number of notions of clusterability found in the literature. While all these notions attempt to measure the same property, and all appear to be reasonable, we show that they are pairwise inconsistent. In addition, we give the first computational complexity analysis of a few notions of clusterability. In the second part of the thesis, we discuss how the quality of a given clustering can be defined (and measured). Users often need to compare the quality of clusterings obtained by different methods. Perhaps more importantly, users need to determine whether a given clustering is sufficiently good for being used in further data mining analysis. We analyze what a measure of clustering quality should look like. We do that by introducing a set of requirements (`axioms') of clustering quality measures. We propose a number of clustering quality measures that satisfy these requirements

    Towards Theoretical Foundations of Clustering

    Get PDF
    Clustering is a central unsupervised learning task with a wide variety of applications. Unlike in supervised learning, different clustering algorithms may yield dramatically different outputs for the same input sets. As such, the choice of algorithm is crucial. When selecting a clustering algorithm, users tend to focus on cost-related considerations, such as running times, software purchasing costs, etc. Yet differences concerning the output of the algorithms are a more primal consideration. We propose an approach for selecting clustering algorithms based on differences in their input-output behaviour. This approach relies on identifying significant properties of clustering algorithms and classifying algorithms based on the properties that they satisfy. We begin with Kleinberg's impossibility result, which relies on concise abstract properties that are well-suited for our approach. Kleinberg showed that three specific properties cannot be satisfied by the same algorithm. We illustrate that the impossibility result is a consequence of the formalism used, proving that these properties can be formulated without leading to inconsistency in the context of clustering quality measures or algorithms whose input requires the number of clusters. Combining Kleinberg's properties with newly proposed ones, we provide an extensive property-base classification of common clustering paradigms. We use some of these properties to provide a novel characterization of the class of linkage-based algorithms. That is, we distil a small set of properties that uniquely identify this family of algorithms. Lastly, we investigate how the output of algorithms is affected by the addition of small, potentially adversarial, sets of points. We prove that given clusterable input, the output of kk-means is robust to the addition of a small number of data points. On the other hand, clusterings produced by many well-known methods, including linkage-based techniques, can be changed radically by adding a small number of elements

    Co-Creative Songwriting for Bereavement Support

    Get PDF
    Self-expression is essential to processing our thoughts and feelings and is central to successful mental health therapy. Art therapy provides a wider range of expressive mechanisms than offered through traditional approaches, allowing individuals to process their emotions when traditional therapies prove unsuccessful. Yet, effective expression through art therapy may call on a level of artistic experience that is not available to all. As such, a lack of expertise or comfort with artistic expression may hinder one’s ability to receive needed mental health support. Creative machines can offer novel therapeutic approaches by offloading the need for creative expertise and opening up creative self-expression to those who lack the corresponding experience. In this paper, we focus on bereavement, and explore a co-creative songwriting system, ALYSIA, as a new form of therapy for those who had recently suffered the loss of a loved one. We evaluate the utility of this creative system in aiding bereaved individuals through several case studies. In addition, we discuss the utility of co-creative systems to the therapeutic context with potential application to a broad range of therapies
    corecore